Rule induction for subgroup discovery with CN2-SD
نویسندگان
چکیده
Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. This paper shows how this can be achieved by modifying the CN2 rule learning algorithm. Modifications include a new covering algorithm (weighted covering algorithm), a new search heuristic (weighted relative accuracy), probabilistic classification of instances, and a new measure for evaluating the results of subgroup discovery (area under ROC curve). The main advantage of the proposed approach is that each rule with high weighted accuracy represents a ‘chunk’ of knowledge about the problem, due to the appropriate tradeoff between accuracy and coverage, achieved through the use of the weighted relative accuracy heuristic. Moreover, unlike the classical covering algorithm, in which only the first few induced rules may be of interest as subgroup descriptors with sufficient coverage (since subsequently induced rules are induced from biased example subsets), the subsequent rules induced by the weighted covering algorithm allow for discovering interesting subgroup properties of the entire population. Experimental results on 17 UCI datasets are very promising, demonstrating big improvements in number of induced rules, rule coverage and rule significance, as well as smaller improvements in rule accuracy and area under ROC curve.
منابع مشابه
Analysis of Example Weighting in Subgroup Discovery by Comparison of Three Algorithms on a Real-life Data Set
This paper investigates the implications of example weighting in subgroup discovery by comparing three state-of-the-art subgroup discovery algorithms, APRIORI-SD, CN2-SD, and SubgroupMiner on a real-life data set. While both APRIORI-SD and CN2-SD use example weighting in the process of subgroup discovery, SubgroupMiner does not. Moreover, APRIORI-SD uses example weighting in the post-processing...
متن کاملSubgroup Discovery with CN2-SD
This paper investigates how to adapt standard classification rule learning approaches to subgroup discovery. The goal of subgroup discovery is to find rules describing subsets of the population that are sufficiently large and statistically unusual. The paper presents a subgroup discovery algorithm, CN2-SD, developed by modifying parts of the CN2 classification rule learner: its covering algorit...
متن کاملAPRIORI-SD: Adapting Association Rule Learning to Subgroup Discovery
& This paper presents a subgroup discovery algorithm APRIORI-SD, developed by adapting association rule learning to subgroup discovery. The paper contributes to subgroup discovery, to a better understanding of the weighted covering algorithm, and the properties of the weighted relative accuracy heuristic by analyzing their performance in the ROC space. An experimental comparison with rule learn...
متن کاملUsing Subgroup Discovery to Analyze the UK Traffic Data
Rule learning is typically used in solving classification and prediction tasks. However, learning of classification rules can be adapted also to subgroup discovery. Such an adaptation has already been done for the CN2 rule learning algorithm. In previous work this new algorithm, called CN2-SD, has been described in detail and applied to the well known UCI data sets. This paper summarizes the mo...
متن کاملInverted Heuristics in Subgroup Discovery
In rule learning, rules are typically induced in two phases, rule refinement and rule selection. It was recently argued that the usage of two separate heuristics for each phase—in particular using the so-called inverted heuristic in the refinement phase—produces longer rules with comparable classification accuracy. In this paper we test the utility of inverted heuristics in the context of subgr...
متن کامل